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An exponential growth posting on the web about the product reviews on 
social media, there has been a great deal of examination being done on 
sorting out the purchasing behaviors of the client. This paper depends on 
utilizing twitter for sentiment analysis to comprehend the customer 
purchasing behavior. There has been a significant increase in e-commerce, 


particularly in persons purchasing products on the internet. As a result, it 
becomes a fertile hotspot for opinion analysis and belief mining. In this 
Keywords: investigation, we look at the problem of recognizing and anticipating a 
client's purchase goal for an item. The sentiment analysis helps to arrive at a 
more indisputable outcome. In this study, the support vector machine, naive 
: ; Bayes, and logistic regression methods are investigated for understanding 
Machine learning the customer's sentiment or opinion on a specific product. These strategies 
Sentimental analysis have been demonstrated to be genuinely for making predictions using the 
Twitter dataset analysis models which examine the client's conclusion/sentiment the most 
precisely. The exactness for each machine learning algorithm will be 
analyzed and the calculation which is the most precise would be viewed as 
ideal. 
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1. INTRODUCTION 

Social media has become one of the most important channels for communication and content 
generation. It fills in as a bound together stage for clients to communicate their contemplations on subjects 
going from their day by day lives to their sentiment on organizations and items. This, thus, has made it a 
significant asset for digging client feelings for errands going from anticipating the exhibition of films to 
aftereffects of stock market exchanges and races. Even though the vast majority is reluctant to answer 
reviews about items or administrations, they express their considerations unreservedly via online media and 
employ a huge impact in molding the assessments of different buyers. These customer voices can impact 
brand recognition, brand dedication and brand support. Therefore, it is basic that big companies give more 
consideration to mining client assessment identified with their brands and items from web-based media. With 
web-based media checking, they will have the option to take advantage of shopper bits of knowledge to 
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improve their item quality, offer better assistance, drive deals, and even recognize new business openings. 
What is more, they can lessen client care costs by reacting to their clients through these web-based media 
channels, as half of clients incline toward arriving at specialist organizations via online media as opposed to a 
call place. 

It is a phenomenal device for undertakings to dissect clients' communicated conclusions via online 
media without expressly posing any inquiries as this methodology frequently mirrors their actual sentiments. 
In spite of the fact that it has disadvantages with respect to the populace examined, it very well may be 
utilized to surmise general assessment. The objective of this exploration is to manufacture a framework that 
can give exact outcomes, helping brands to see how the clients are responding to the specific item. Nowadays 
interpersonal organizations, web journals, and other media produce an enormous measure of information on 
the Internet. This tremendous measure of information contains pivotal sentiment related data that can be 
utilized to profit organizations and different parts of business and logical ventures. Manual following and 
separating this valuable data from this monstrous measure of information is practically inconceivable. 
Sentiment analysis of user posts is required to help take business decisions. It is a cycle which extricates 
notions or suppositions from audits which are given by clients over a specific subject, zone, or item on the 
web. Estimation may be divided into two types: i) good or ii) negative that determines an individual's overall 
attitude toward a given subject. Predicting the sentiment of a tweet is our main priority. Purchasing objectives 
are often assessed and used by advertising executives as a contribution to decisions regarding new and 
current goods and administrations. Until date, many businesses have used client overview frameworks in 
which they offer questions such, "How likely are you to buy an item in a certain time span?" and then use 
that data to calculate the buy goal. We need to see whether we can use Twitter tweets to train a model that 
can differentiate tweets that indicate a purchase intention for a product. 


2. RELATED WORKS 

Tan et al. [1] proposed interpreting public sentiment variation to be able to further understand the 
reason behind the shift of public opinion on product or even people. In this case, they proposed using two 
models: one foreground and background latent dirichlet allocation (LDA) to filter out background topics that 
have no significance in the most recent public sentiment variation, and the other reason candidate and 
background LDA to rank the various reasons based on their "popularity" in the given period. It also 
employed Gibb's sampling since it was simple to expand and shown to be a successful approach. A sentiment 
analysis tool for slang word translation was also used, which could translate slangs into legitimate terms, 
which may be beneficial for more accuracy. They used data from the Stanford Network Analysis Platform. 
The suggested approach outperformed previous models in terms of accuracy and might be used for product 
evaluations, scientific publications, and many other applications; it is also the first effort to assess public 
sentiment changes. Xia et al. [2] developed dual sentiment analysis to solve the polarity shift problem in 
sentiment analysis, which affects the entire order but is otherwise treated the same in a typical model. So, in 
order to address the polarity shift, they offer dual training and dual prediction algorithms to assess both 
original and reversed data in order to comprehend not only how positive or negative the original data is, but 
also how positive or negative the reversed data is. They also expanded their polarity paradigm to a three-class 
structure that includes neutral data. They created language-independent pseudo-antonym dictionaries to 
lessen their reliance on external antonym dictionaries. Support vector machine (SVM), naive Bayes, and 
logistic regression classifiers were used, and it was discovered that they exceed the baseline by 3.0 and 1.7% 
on average, respectively. Hamroun et al. [3] advocated using latent semantics instead of current models that 
employ polarity terms and matching phrases and may fail when views are stated using latent semantics, 
which is known as customer intents analysis. They combined OpenNLP, W3C Web Ontology Language 
(OWL) ontologies, and WordNet natural language processing processes with additional meanings. Their 
strategy was to automatically extract patterns from Twitter for consumer intention research. The idea is to 
use domain ontology for two key purposes: creating ontology representations and using ontology 
representations in pattern learning. They utilized five distinct datasets, with the continuous integration (CI) 
pattern outperforming the baseline by 3-6% on average. 

Li et al. [4] proposed combining two models: Sentiment-specific word embeddings and Weighted 
text feature modal. Because the majority of conventional models are either lexicon-based or machine 
learning-based. Instead of immediately using the word embeddings approach, it will be done by first 
constructing vectors in order to avoid missing out on semantic hints and to enhance semantic categorization. 
weighted text feature model that generates two sort of features: the first is a negation feature based on 
negation terms, and the second is generated by computing the similarity of tweets and their polarity. The 
suggested strategy outperformed the previous model and separated sentiment specific word embeddings 
(SSWE) and (weighted text feature model (WTFM); moreover, when SSWE + word2vec was used, the 
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performance was extremely near to SSWE. Tweepy, a Twitter Application Program Interface (API), was 
utilized to generate the dataset. Ren and Wu [5] created a lexicon-based learning method that is also language 
dependent to anticipate unknown user subject opinions. They attempted to include topical and social 
information into the current prediction model mathematically. They understood the association between 
social and topical context after applying an appropriate hypothesis and also utilized topic content similarity 
(TCS) to quantify the same. The findings revealed that the suggested ScTcMF framework was really superior 
to the existing one. The scope of the project was just for twitter and the dataset was also from twitter API. 
Chen et al. [6] evaluated a very hard constraint project which only focused on engineering students' 
difficulties faced during their program. Naive Bayes and multi-label classification algorithms were employed 
in the technique. The method used was a combination of qualitative analysis and large-scale data mining 
approaches. It is a machine learning method that is also language dependent. It was founded on the notion 
that informal social media data might give additional information about students’ experiences. Purdue 
University provided the tweets, which included subjects ranging from sleep deprivation to food. The dataset 
was taken from twitter API Tweepy. Bollegala et al. [7] looked to address the mismatch problem arising in 
trained dataset and target dataset that is when the trained dataset has been for selected words and the test data 
does not contain those words, it creates a mismatch. In order to overcome this mismatch problem, they came 
up with a cross-domain sentiment classifier where they used already extracted sentiment sensitive words and 
were able to determine that the existing models such as SentiWordNet, which is a lexical resource were 
outperformed by cross-domain classifier. It also uses a lexical based approach and is a language dependent 
model aimed mainly at product reviews and the dataset was taken from amazon.com. 

Lin et al. [8] presented a joint sentiment analysis model as well as a reparametrized version of 
supervised joint sentiment-topic because it was frequently observed that the weakly supervised joint 
sentiment topic, which is a component of LDA, failed to produce acceptable performance when shifting to 
new domains. As a result, our model can now recognize both sentiment and the subject of a certain data set. 
It is a machine learning method that is also language dependent. The dataset came from Amazon.com and 
IMDB.com and was based on product or movie reviews. Wang ef al. [9] proposed that for complete 
sentiment analysis of a tweet, we should also consider hashtags as complete words, and that three types of 
information are required to generate the complete sentiment polarity for hashtag, which differs from sentence 
and document level sentiment analysis. They also suggested using improved boosting classification, which 
would allow us to use the literal meaning of hashtags as a semi-supervised training set. To construct the 
hashtag sentiment, they utilized an SVM classifier; it was a language dependent model for the Twitter 
dataset. Mudinas ef al. [10] assessed both lexicon-only and learning-only approaches and presented a hybrid 
strategy that takes the best of both worlds from lexicon and learning-only algorithms. When they ran the 
experiment, they discovered that the sentiment polarity classification and sentiment strength detection values 
in their pSenti system were higher, which is very near to the pure learning model and higher than the pure 
lexicon model. It was language-specific and used both machine learning and lexical models. This model was 
created for software and movie reviews, including data from computer network (CNET) and internet movie 
database (IMDB). Yu ef al. [11] built their whole research around a movie domain case study and assessed 
the difficulty of forecasting sales using sentiment analysis. They investigated several hidden sentiment 
components in order to use sentiment Probabilistic Latent Sentiment Analysis (PLSA) to evaluate 
complicated forms of sentiment. They then suggested an updated version of the auto-regressive sentiment 
aware model to boost accuracy. It was a language-dependent, machine-learning-based model that focused on 
sales prediction in a movie-based case study. The dataset was derived from the Twitter API, Tweepy, and 
was created exclusively for Twitter. 

Jose and Chooralil [12] evaluated and tried to address the problem with selecting just one algorithm 
for sentiment analysis, so they came up with the solution of combining machine learning algorithms along 
with lexicon-based algorithms which would choose the appropriate algorithm for its use so as to remove the 
risk of selecting inappropriate classifiers. They chose SentiWordNet classifier, naive Bayes classifier, and 
Hidden Markov model classifier, which showed to be more accurate. So, after analyzing sentiment 
classification on numerous tweets, they concluded that their ensemble technique produced an accuracy of 
roughly 71.48%, which was higher than all three classifiers combined. Kouloumpis ef al. [13] recommended 
using Twitter hashtags to achieve even more accurate sentiment analysis since hashtags and emoticons may 
occasionally add significantly to model accuracy. In contrast to basic sentiment or non-sentiment analysis, 
they would employ a three-way classifier. To work on the datasets, they concentrated on n-gram features, 
lexicon features, and part of speech features. They employed three datasets for development and training: 
hashtagged dataset from Edinburgh Twitter Corpus, emoticon dataset from twittersentiment.appspot.com, 
and iSieve company for assessment. After doing their investigation, they discovered that combining the n- 
gram, lexicon, and microblogging features resulted in an accuracy of 74-75%. Park and Seo [14] used 
sentiment analysis to rank the three AI assistants, Siri from Apple, Cortana from Microsoft, and Google 
Assistant from Google, based on user feedback. They evaluated tweets using valence aware dictionary and 


A comprehensive analysis of consumer decisions on Twitter dataset ... (Vigneshwaran Pandi) 


1088 O ISSN: 2252-8938 


sentiment reasoner (VADER), the Kruskal Wallis test, and the Mann-Whitney test to determine statistical 
significance between groups. They employed null hypotheses and the t-test to determine how the similarity 
of various aides varied over time. 

Prakruthi et al. [15] assess people's feelings towards a person, trend, product, or brand. The Twitter 
API is used to directly retrieve tweets from Twitter and construct sentiment classifications for the tweets. The 
data are categorized and represented using a Histogram and a Pie Chart. The pie chart depicts the %age of 
positive, negative, and neutral attitude, which is believed to be roughly 65% positive, 20% negative, and 15% 
neutral. The histograms below depict positive, negative, and neutral emotion. Go ef al. [16] tested many 
models and performed trials to identify the best classifier for organizations who wish to analyze the 
sentiment of their products. Twitter tweets with emoticons serve as training data. Three classifiers were used: 
naive Bayes, maximum entropy, and SVM; all methods had an accuracy of more than 80% when trained 
using emoticon data. However, the SVM was the most accurate, with an accuracy of 85%. 

Trupthi et al. [17] want to do real-time sentiment analysis on tweets retrieved from Twitter and 
present the results to the user. The tools and processes used here are natural language processing. Naive 
Bayes and Twitter API. Natrual Language Processing (NLP) is used to remove the words with tags which is 
not helpful for the building of the classifier. The tweets removed by the Streaming API are then arranged into 
positive, negative, or unbiased tweets. The analytics for word nepotism from twitter is evident that Twitter 
verse feels mostly negative about nepotism. The results for the word education were mostly positive. 
Karthika et al. [18] evaluated different models and the experiments were conducted to find the best classifier 
to analyze the reviews from shopping site amazon. Based on those reviews the product is classified as 
positive, negative, or neutral. Algorithms used here are random forest and SVMs. Random forest gave the 
best accuracy showing 84% while SVM showed 81% accuracy. Dataset contains reviews from 7 different 
products. Ramalingam et al. [19] tested numerous models and performed trials to discover the best classifier 
for identifying similar qualities among depressed persons and identifying them using various machine 
learning methods. The algorithms are intended to examine tweets for emotion detection as well as the 
identification of suicide ideation among social media users. logistic regression, SVM, and Random Forest are 
the algorithms employed here. The goal of these strategies is to leverage data accessible on Twitter and other 
social media to forecast people's mindsets by studying their numerous social media posts. When compared to 
logistic regression and random forest, SVM has the highest accuracy of 82.5%. Singh and Kumar [20] 
analyzed numerous models and conducted trials to determine the best method for predicting cardiac disease 
using various machine learning techniques. K-nearest neighbor, decision tree, linear regression, and SVM are 
the approaches. Jupyter notebook is employed as the simulation tool in this case. The dataset contains 14 
variables such as sex, age, blood sugar, and so on. We discovered that the accuracy of each algorithm was 
87%, 79%, 78%, and 83%, respectively. As a result, k-nearest neighbor (KNN) is the most precise. Sujath et 
al. [21] tested many models and performed tests to determine the optimal method for analyzing the impact of 
COVID-19 on the stock market. Using several algorithms, we attempt to determine which method provides 
the best accurate prediction of the impact of COVID-19 on the stock market. The algorithms are random 
forest, linear regression, and SVM. The dataset was discovered on Kaggle. We discovered that SVM had the 
highest accuracy of 82%. 

Mujumdar and Vaidehi [22] analyzed different models and experiments were conducted to find the 
best algorithm to predict diabetes among patients. The dataset contains 800 records and 10 attributes. 
Algorithms used here are decision tree, logistic regression and KNN. Logistic regression shows the most 
accuracy with 96% compared to the other two which shows only 90% and 86% accuracy. Huq et al. [23] 
examined many models and performed tests to determine the best algorithm to predict the sentiment of a 
tweet on social media, i.e., whether it is good, negative, or neutral. It generally focuses on the tweet's 
wording and sentiment. KNN and SVM are the algorithms applied in this case. The dataset was obtained 
from the website Kaggle. According to the research, KNN is the most accurate, with an accuracy rate of 84%. 
Lassen et al. [24] examined many models and performed trials to determine the best algorithm to forecast 
iPhone sales based on tweets. The tweets are categorized as good, negative, or neutral. The dataset utilized 
here contains 400 million tweets from 2007-2010. Predictions are performed using linear regression and 
multiple regression models. Multiple regression has the smallest gap between anticipated and actual sales (5- 
10%), making it the most accurate. Dhir and Raj [25] examined many models and performed trials to 
determine the best algorithm for predicting movie performance. In this section, we analyze the internet movie 
database (IMDB) and estimate the IMDB score, as well as how it influences the movie collection. Logistic 
regression decision tree and random forest are the methods employed in this case. With 61% accuracy, 
random forest is the best. It demonstrates that social media likes, the number of voted users, and the length all 
have a significant impact on the IMDB score. Labib et al. [26] used machine learning methodologies to 
examine multiple models and perform tests to discover the optimal algorithm to analyze traffic incidents to 
predict the intensity of accidents. The algorithms employed in this case include naive Bayes, decision trees, 
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KNN, and AdaBoost. It classifies the severity of incidents as deadly, serious, or minor harm. AdaBoost has 
the highest accuracy rate of 80%. It also revealed that accidents are more common at no-joint exits and T 
intersections. Wongkar and Angdresey [27] created this model for the 2019 presidential election using 
Python and the naive Bayes, SVM, and K-NN classifiers. Crawlers were employed to get tweets from 
Twitter, which were then tokenized to discover significant terms. They discovered that naive Bayes was more 
accurate, with an accuracy of 75-76%, after extensive study. 

Gamon [28] proposed to perform sentiment analysis on even noisy data by the use of large feature 
vectors with feature reduction. As customer feedback are received at a very large volume, to be able to react 
to it quickly there has to be an efficient model to class the tweets into positive, negative, and neutral. They 
used NLPW in natural language processing for linguistic analysis. The accuracy at the end was 85.47%. 
Kusrini and Mashuri [29] proposed two classifiers SVM and naive Bayes and compared both classifiers to 
understand which classifier gives the best result. It first takes the dataset, uses tokenization to segregate the 
words, removes various slangs and then uses stemming using python to reduce the volume of data. The 
accuracy at the end was around 82-83%. Mandloi and Patel [30] proposed using three different classifiers 
namely SVM, naive Bayes and maximum entropy classification to understand the user’s sentiment towards 
the following product, movie, and the people’s alignment towards the political parties. To extract the data, 
they used three features namely unigram, bi-gram and n-gram features and the accuracy came out to be 85% 
for naive Bayes. 


3. COMPARISON ANALYSIS 

Table 1 (as seen in Appendix) shows the comparison of existing systems. To summarize all the 
existing works on sentiment analysis, we’ve gone through, we can divide it categorically into four types, 
which are document-level, sentence-level, phrase-level, and aspect-level. These existing papers tried to either 
tackle any one of the four types or even clubbed them, some tried to incorporate hashtags, some even tried to 
incorporate emoticons, some had language dependency, and some even had language independency. Some 
had greater accuracy but could tackle only one of the types, where some even had lesser accuracy but could 
incorporate a lot, some even tried building a complete corpus-based antonym dictionary. 

Overall, we have a lot to dig in to use opinion mining to its fullest potential. What we will be doing 
in our model is, we will be taking the three best performing algorithms which are SVM, naive Bayes, and 
logistic regression to build a model which would allow enterprises to actually understand how well their 
products are performing, what shortcomings did customers feel, what could be better and many more. The 
proposed system will be much more efficient. 


4. CONCLUSION AND FUTURE WORK 

This article addresses a number of machines learning methods, including naive Bayes, SVM, 
logistic regression, and random forest. After extensive research, we discovered that SVM, naive Bayes, and 
logistic regression may be utilized to develop a model for our project that will provide a more accurate model 
than the present one, as demonstrated in the publications above. As we all know how analysis of twitter is 
being done to mine the opinions of users or customers in order to bring in potential customers or to enhance 
their products or services. Hence, it has become very important to constantly evolve and bring out even more 
accurate models. This work will help enterprises to draw out a basic idea on how the customers are reacting 
to the products which will then help them to make the product even better. This may help enterprises to leave 
behind the traditional methods of feedback forms which anyways is not very accurate. 

People now have the option to organize the unrelenting rise of knowledge from interpersonal 
organizations. Because virtually all actual complicated concerns ranging from natural to mechanical in nature 
may be addressed via social media, its challenges should be heard. Rumor detection, evaluation repetition, 
patterns of online conversations resulting in riotous circumstances, and online shaming, all shift assumptions, 
allowing us to understand social pervasiveness in the form of preferences, shares, and retweets. Finding the 
right content and the right time to publish are two of the most important difficulties that need be addressed in 
interpersonal organizations before fully integrating into people's life. Indeed, even the detection of fraudulent 
remarks should be attended to at the tiniest level of social places like Twitter to avoid unnecessary badgering 
from spammers. Medical issues of genuine concern should be addressed in additional study so that they have 
a strong impact via web-based media clients. It would be appropriate at this point to prepare a tied up unified 
model that comprehends the assessments of the clientele when she/he is making remarks on social media. 
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Table 1. Performance analysis comparison of existing systems (continue) 


Name of the authors Year of publication Methodology Algorithms used Accuracy 
Tan et al. [1] 2014 Foreground and Gibbs sampling, 69.70% 
Background LDA, Parameter estimation, 
Reason Candidate and Average word entropy 
Background LDA 
Xia et al. [2] 2015 Dual Training and Dual Naive Bayes, SVM, 85-87% 
Prediction along with logistic regression 
corpus-based antonym 
dictionary 
Hamroun et al. [3] 2015 OPEN NLP, WordNet, CI patterns 712% 
OWL ontology 
Li et al. [4] 2016 LibLinear Model and N-gram, SSWE, WTFM 66.8 
RNDN 
Ren and Wu [5] 2013 The social and topical Breadth-first search, 60.35 
contexts Factorization user topic opinion 
of Matrixes (ScTcMF) labelling 
Chen et al. [6] 2014 Use informal social Naive Bayes, multilevel 61% 
medial data to provide classification 
insights 
Bollrgala et al. [7] 2013 SentiWordNet lexica Cross-domain sentiment 80% 
classifier, corpus based classification 
Lin et al. [8] 2012 To identify sentiment Joint sentiment-topic 71.20% 
and topic from text at (JST) model with weak 
the same time supervision based on 
latent Dirichlet 
allocation (LDA, 
Reverse-JST). 
Wang et al. [9] 2011 To automatically create SVM classifier 16% 
the overall sentiment 
polarity for a specific 
hashtag during a 
specified time period, 
which differs 
significantly from the 
typical sentence-level 
and document-level 
sentiment polarities. 
Mudinas et al. [10] 2012 To classify polarity and A hybrid strategy 771% 
detect sentiment (lexicon-based + M/c 
strength learning) was used. 
Yu et al. [11] 2012 To Predict Sales Sentiment S-PLSA (an 73% 
Performance Autoregressive 
Sentiment and Quality 
Aware model) 
Jose and Chooralil [12] 2016 Three-way classifier n-gram feature, lexicon 75% 
unlike simple sentiment feature and part of 
or non-sentiment speech feature 
analysis 
Kouloumpis et al. [13] 2011 Three-way classifier n-gram feature, lexicon 75% 
unlike simple sentiment feature and part of 
or non-sentiment speech feature 
analysis. 
Park and Seo [14] 2018 Three AI assistants VADER, Kruskal 1% 
namely Siri by Apple, Wallis test and Mann- 
Cortana by Microsoft Whitney test 
and Google Assistant by 
Google using sentiment 
analysis 
Prakruthi et al. [15] 2018 Sentiment classification | Bag of Words algorithm 68% 
for the tweets using 
Histogram and Pie 
Chart. 
Go et al. [16] 2009 Unigrams, Bi-grams, Naive Bayes, SVM 81% 
and parts of speech to 
use emoticons 
Trupthi et al. [17] 2017 Natural Language Naive Bayes 74% 
Processing —- NLTK classification 
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Table 1. Performance analysis comparison of existing systems 


Name of the authors Year of publication Methodology Algorithms used Accuracy 
Karthika et al. [18] 2019 Receiver Operating Random forest 84% 
Characteristic (ROC) algorithm, SVM 


curve to evaluate 
classifier output 


Ramalingam et al. [19] 2019 Machine learning and Logistic regression, 82.50% 
lexicon-based SVM, and random 
techniques to opinion forest 


mining, as well as 
assessment metrics 


Singh and Kumar [20] 2020 Machine learning k-nearest neighbor, 87% 


algorithms' accuracy in decision tree, linear 
predicting heart disease regression, and support 
vector machine 


Sujath et al. [21] 2020 forecasting model for decision tree, logistic 96% 
COVID-19 pandemic regression and KNN. 
Mujumdar and Vaidehi 2019 best algorithm to predict decision tree, logistic 96% 
[22] diabetes among patients. regression and KNN 
Hug et al. [23] 2017 To predict the sentiment  KNN and SVM 84% 
of a tweet on social classifiers 
media 
Lassen et al. [24] 2014 predict iPhone sales linear regression and 10% 
using tweets based on multiple regression 
iPhone models 
Dhir and Raj [25] 2018 movie success logistic regression 61% 
prediction decision tree and 
random forest 
Labib et al. [26] 2019 determine the intensity naive bayes, decision 80% 
of accidents trees, KNN and 
AdaBoost 
Wongkar and 2019 Data collection utilizing | Naive Bayes classifier, 76% 
Angdresey [27] Python libraries, text SVM classifier and K- 
processing, testing NN classifier 
training data, and text 
categorization 
Gamon [28] 2004 Train linear SVMs to NLPW in natural 85% 
obtain high language processing for 
classification accuracy linguistic analysis. 
on difficult-to-classify 
data. 
Kusrini and Mashuri 2019 Lexicon Based and SVM, naive Bayes 83% 
[29] Polarity Multiplication 


Mandloi and Patel [30] 2020 Three features namely SVM, naive Bayes and 85% 


unigram, bi-gram and n- Maximum Entropy 
gram features 
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